In the heart of the bustling metropolis of New York City, Manhattan stands as an enduring symbol of urban living and aspiration. Renowned for its iconic skyline, diverse neighborhoods, and unmatched cultural richness, Manhattan has long been a magnet for both residents and investors in the real estate market. Over the years, the cost of living in this borough has witnessed significant fluctuations, evolving in response to economic, demographic, and societal shifts. Understanding the dynamics of rental prices in Manhattan is a crucial endeavor, not only for residents seeking affordable and desirable housing but also for real estate professionals and policymakers aiming to make informed decisions in this vibrant urban landscape.
This data science report embarks on a comprehensive exploration of the rental market in Manhattan, spanning both historical and contemporary perspectives. To provide valuable insights into the multifaceted realm of Manhattan rentals, this report addresses a series of pertinent questions and research inquiries:
In the following sections, this report endeavors to provide a nuanced understanding of Manhattan's rental market, leveraging data-driven insights to shed light on the factors shaping rental prices, affordability trends, and the broader implications of these findings for stakeholders in the real estate ecosystem.
# Import the Necessary Libraries
import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.figure_factory as ff
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
import scipy.stats as stats
import plotly.graph_objects as go
# Import the .csv File to Be Read
df_rent = pd.read_csv('manhattan.csv')
# Drop Irrelevant Data from the Data Frame
df_rent=df_rent.drop('borough',axis=1)
df_rent=df_rent.drop('rental_id',axis=1)
df_rent['neighborhood'].replace('Morningside Heights','Upper West Side')
df_rent
| rent | bedrooms | bathrooms | size_sqft | min_to_subway | floor | building_age_yrs | no_fee | has_roofdeck | has_washer_dryer | has_doorman | has_elevator | has_dishwasher | has_patio | has_gym | neighborhood | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2550 | 0.0 | 1 | 480 | 9 | 2.0 | 17 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | Upper East Side |
| 1 | 11500 | 2.0 | 2 | 2000 | 4 | 1.0 | 96 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Greenwich Village |
| 2 | 4500 | 1.0 | 1 | 916 | 2 | 51.0 | 29 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | Midtown |
| 3 | 4795 | 1.0 | 1 | 975 | 3 | 8.0 | 31 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | Greenwich Village |
| 4 | 17500 | 2.0 | 2 | 4800 | 3 | 4.0 | 136 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | Soho |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3534 | 4210 | 1.0 | 1 | 532 | 3 | 8.0 | 16 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | Chelsea |
| 3535 | 6675 | 2.0 | 2 | 988 | 5 | 10.0 | 9 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | Tribeca |
| 3536 | 1699 | 0.0 | 1 | 250 | 2 | 5.0 | 96 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Little Italy |
| 3537 | 3475 | 1.0 | 1 | 651 | 6 | 5.0 | 14 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | Midtown West |
| 3538 | 4500 | 1.0 | 1 | 816 | 4 | 11.0 | 9 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | Tribeca |
3539 rows × 16 columns
# Plot Heatmap to Determine Variables with Strong Correlation
corr = df_rent.corr()
plt.figure(figsize=(10,8), dpi =500)
sns.heatmap(corr,annot=True,fmt=".2f", linewidth=.5)
plt.show()
The heat map analysis of the dataset reveals some significant correlations among the variables. Notably, square footage and rent exhibit a strong positive correlation with a coefficient of 0.86, suggesting that larger living spaces tend to command higher rents. Similarly, the number of bathrooms and rent display a substantial positive correlation coefficient of 0.77, indicating that properties with more bathrooms tend to come with higher rental prices. Additionally, the number of bedrooms also correlates positively with rent, though to a slightly lesser extent, with a coefficient of 0.64. These findings collectively indicate that the most influential factor impacting rental prices in Manhattan is the size and square footage of the rental space, followed closely by the number of bathrooms and bedrooms.
# Distribution of Variables
df_rent.hist(figsize=(20, 20))
array([[<AxesSubplot:title={'center':'rent'}>,
<AxesSubplot:title={'center':'bedrooms'}>,
<AxesSubplot:title={'center':'bathrooms'}>,
<AxesSubplot:title={'center':'size_sqft'}>],
[<AxesSubplot:title={'center':'min_to_subway'}>,
<AxesSubplot:title={'center':'floor'}>,
<AxesSubplot:title={'center':'building_age_yrs'}>,
<AxesSubplot:title={'center':'no_fee'}>],
[<AxesSubplot:title={'center':'has_roofdeck'}>,
<AxesSubplot:title={'center':'has_washer_dryer'}>,
<AxesSubplot:title={'center':'has_doorman'}>,
<AxesSubplot:title={'center':'has_elevator'}>],
[<AxesSubplot:title={'center':'has_dishwasher'}>,
<AxesSubplot:title={'center':'has_patio'}>,
<AxesSubplot:title={'center':'has_gym'}>, <AxesSubplot:>]],
dtype=object)
* For variables with 0 and 1, "0" indicates "no" and "1" indicates "yes"
The histograms of the data provide valuable insights into the distribution of key variables.Rent, bathrooms, square footage, floors, and building age all exhibit right-skewed distributions, suggesting that in Manhattan, the majority of rentals tend to be clustered at the lower end of these variables, with a few high-value outliers. In terms of the number of rooms, the data indicates that most rentals have only one room, followed by two-bedroom units, with studios being the next most common type. Furthermore, it's interesting to note that the majority of rentals have just one bathroom, indicating that this is the prevailing configuration in Manhattan apartments. Finally, the histograms reveal that features like washer and dryers and dishwashers are not commonly found in these rental units, suggesting that these amenities might be less common or considered less essential in this market.
# Number of Available Properties in Each Area
fig = px.bar(df_rent['neighborhood'].replace('Morningside Heights','Upper West Side').value_counts())
fig.update_layout(title_text = 'Total Rental Units by Neighborhood',
showlegend = False,
xaxis_title = 'Neighborhood', #x-axis label
yaxis_title = 'Number of Rentals', #y-axis label
barmode='group',
bargap=0.01, #Gap between bars of adjacent location
bargroupgap=0.01)
fig.show()
The bar graph illustrating the distribution of rental properties across different areas of Manhattan provides a clear view of the rental market in the borough. It's evident that the Upper West Side boasts the highest number of available rental properties, followed by Midtown East, the Financial District, Flatiron, and Tribeca, in descending order. These findings could imply several things about the Manhattan rental market. First, the popularity of the Upper West Side as the area with the most rental properties suggests it might be a preferred residential destination due to its amenities, accessibility, or housing options. Conversely, the lower count of rental properties in areas like Tribeca and Flatiron could indicate a higher demand, potentially leading to higher rental prices in these sought-after neighborhoods.
# Analysis of Average Rental Prices
avg_rents = df_rent.groupby('neighborhood').mean().reset_index()[['neighborhood', 'rent']]
avg_rents['Rent'] = [round(r) for r in avg_rents.rent]
avg_rents.sort_values(by=['Rent'], ascending=False, inplace=True)
fig = px.bar(avg_rents,x = 'neighborhood',y='Rent')
fig.update_layout(title_text = 'Manhattan Average Rental Costs by Neighborhood',
showlegend = False,
xaxis_title = 'Neighborhood', #x-axis label
yaxis_title = 'Average Rent', #y-axis label
barmode='group',
bargap=0.01, #Gap between bars of adjacent location
bargroupgap=0.01)
fig.show()
The bar graph displaying the average rental prices across different neighborhoods in Manhattan offers a clear perspective on the cost of living in various areas. Notably, Soho emerges as the neighborhood with the highest average rental price, standing at \$8,468, followed closely by Tribeca at \\$7,975. Central Park South also commands a significant average rental price of \$7,507, making it one of the top expensive areas to rent in Manhattan. Nolita follows closely with an average of \\$6,687. Conversely, Manhattanville appears to be the most affordable neighborhood in terms of rental prices, with an average of \$1,395. These findings shed light on the considerable price disparities within Manhattan, reflecting the unique appeal and amenities associated with each neighborhood.
order = listorder = list(avg_rents.neighborhood)
fig = px.box(df_rent, x = 'neighborhood', y = 'rent', color='neighborhood')
fig.update_xaxes(categoryorder='mean descending')
fig.update_layout(title_text = 'Manhattan Rental Costs by Neighborhood',
showlegend = False,
xaxis_title = 'Neighborhood',
yaxis_title = 'Average Rent')
fig.show()
The boxplots displaying the mean rent values categorized by different neighborhoods in Manhattan offer valuable insights into the rental price distribution across the borough. These boxplots not only show the central tendencies (medians) but also highlight the presence of outliers, which can be quite informative. The findings reveal notable variations in mean rent across neighborhoods, with some areas having higher medians and others lower. The presence of outliers suggests that within each neighborhood, there can be a wide range of rental prices, potentially influenced by factors like property type, size, or unique features.
# Check for Linearity
fig = ff.create_scatterplotmatrix(df_rent[['rent','bathrooms','bedrooms','size_sqft']],diag='histogram',
height=1000,
width=1000)
fig.update_layout(title_text = 'Scatter Matrix of Features Correlated with Rent')
fig.show()
The scatter plot depicting the correlation between rent prices and square footage of rental properties in Manhattan reveals a clear and positive linear relationship between these two variables. As square footage increases, there is a noticeable trend of higher rental prices, suggesting that larger living spaces tend to come with higher monthly rents. This finding has important implications for both renters and property owners or investors. Renters looking for more spacious accommodations should expect to pay a premium for larger apartments, while property owners and investors may recognize the potential for increased rental income by offering larger units. Additionally, this correlation underscores the significance of square footage as a critical factor influencing rental prices in the Manhattan real estate market, which can inform pricing strategies and decision-making in the rental property market.
# Build the Model to Predict Rent Prices
y = df_rent.rent
X = df_rent['size_sqft'].values.reshape(-1,1)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,shuffle=True)
model = LinearRegression()
model.fit(X_train,y_train)
accuracy = model.score(X_test,y_test)
print('The model accuracy is',accuracy*100,'%')
The model accuracy is 73.86698717517729 %
The linear regression model, with an accuracy rate ranging from 70% to 75%, provides valuable insights into the predictability of rent prices based on square footage in Manhattan. This level of accuracy suggests that square footage is a reasonably strong predictor of rental prices, but it also indicates that there are other influential factors at play. Possible implications include the existence of additional variables, such as location, amenities, and property condition, that significantly impact rent prices. This finding underscores the complexity of the Manhattan rental market, where multiple factors interact to determine pricing.
residuals = y - model.predict(X)
fig = sns.residplot(x=model.predict(X),y=residuals).set(title='Fitted vs. Residuals',xlabel='Predicted Values', ylabel='Residuals')
plt.show()
The residual plot reveals that as one predicts higher rental prices, the disparity between the predictions and the actual rental prices becomes increasingly dispersed. In simpler terms, the model exhibits reduced accuracy when forecasting exceptionally expensive rentals. In most instances where rental prices exceed \$15,000, the predictions tend to underestimate the actual rents. A noteworthy outlier exists, where the model significantly underestimated the rent by over \\$10,000. These observations suggest that the model may not perform optimally for predicting the prices of very high-end rentals.
# Predicting Rent Prices for Given Square Footage
def predict_rent(sqft):
return model.predict(np.array([sqft]).reshape(1, -1))
# Predict Rent Prices for Given Square Footage
value = int(input('What is the square footage of the rental?'))
rent = predict_rent(value)
rent_rounded = int(np.round(rent))
print(f'The estimated price of the rental with a square foot of {value} square feet is ${rent_rounded}. ')
What is the square footage of the rental?866 The estimated price of the rental with a square foot of 866 square feet is $4734.
In conclusion, this data science report on New York rentals in Manhattan has yielded several key findings that provide valuable insights into the local rental market. Notably, we observed a strong positive linear relationship between rental prices and square footage, underscoring the importance of property size in determining rent. The analysis of neighborhood-wise rental prices revealed significant variations, with Soho, Tribeca, Central Park South, and Nolita emerging as the top neighborhoods with the highest average rents, while Manhattanville had the lowest average rent. Moreover, the exploration of model accuracy indicated that while a linear regression model can predict rents with moderate success, it exhibited limitations in accurately forecasting very high-end rentals.
These findings have important implications for both renters and property investors. Renters can use this information to set realistic expectations based on location and property size, while property investors can better understand the dynamics of the Manhattan rental market to make informed decisions regarding pricing and investment.
Future research endeavors could delve deeper into the factors influencing rental prices beyond square footage, exploring variables such as location amenities, property condition, and local economic trends. Additionally, more advanced predictive modeling techniques could be employed to enhance accuracy, particularly for the high-end rental segment. Overall, this report serves as a foundation for further investigation into the complexities of the Manhattan rental market, offering valuable insights for both practical decision-making and academic exploration in the field of real estate economics.